越来越多的文献证明了使用射频(RF)信号在遮挡和照明不良的情况下实现关键的计算机视觉任务的可行性。它利用RF信号遍历墙壁和遮挡,以使壁姿势估计,动作识别,场景字幕和人类重新识别。但是,与可以由人工工人标记的RGB数据集不同,标记RF信号是一项艰巨的任务,因为这些信号不是人类的可解释。但是,收集未标记的RF信号非常容易。使用此类未标记的RF数据以无监督的方式学习有用的表示形式将是非常有益的。因此,在本文中,我们探讨了调整基于RGB的无监督表示为RF信号的可行性。我们表明,尽管对比度学习已成为无监督的表示从图像和视频学习的主要技术,但当使用RF信号应用于感知人类时,这种方法的性能较差。相反,预测性无监督学习方法学习可用于多个基于RF的传感任务的高质量表示。我们的经验结果表明,这种方法的表现优于基于RF的最先进的人类对各种任务的感知,从而开放了从这种新颖方式中学习的可能性。
translated by 谷歌翻译
现实世界数据往往展现出长期分布,重量级别不平衡,其中大多数课程可以主导培训过程并改变少数阶层的决策边界。最近,研究人员调查了监督对长尾识别的对比学习的潜力,并证明它提供了强大的性能增益。在本文中,我们表明,虽然监督对比学习可以有助于提高性能,但过去的基线通过不平衡数据分布引入的均匀性差。这种差的均匀性在来自特征空间中具有差的少数阶级的样品中表现出来。为了解决这个问题,我们提出了有针对性的监督对比学习(TSC),从而提高了极度上的特征分布的均匀性。 TSC首先生成一组均匀分布在极度上的目标。然后,在训练期间使不同类别的特征会聚到这些不同的和均匀分布的目标。这迫使所有类别,包括少数群体类别,以维持特征空间中的统一分布,改善了类边界,即使在存在长尾数据的情况下也能提供更好的泛化。多个数据集的实验表明,TSC在长尾识别任务上实现了最先进的性能。
translated by 谷歌翻译
对比度学习(CL)可以通过在其顶部的线性分类器上学习更广泛的特征表示并实现下游任务的最先进的性能。然而,由于对抗性稳健性在图像分类中变得至关重要,但仍然不清楚CL是否能够为下游任务保留鲁棒性。主要挑战是,在自我监督的预押率+监督的FineTuning范式中,由于学习任务不匹配从预先追溯到Fineetuning,对抗性鲁棒性很容易被遗忘。我们称之为挑战“跨任务稳健性转移性”。为了解决上述问题,在本文中,我们通过稳健性增强的镜头重新审视并提前CL原理。我们展示了(1)对比视图的设计事项:图像的高频分量有利于提高模型鲁棒性; (2)使用伪监督刺激(例如,诉诸特征聚类)增强CL,有助于保持稳健性而不会忘记。配备了我们的新设计,我们提出了一种新的对抗对比预制框架的advcl。我们表明Advcl能够增强跨任务稳健性转移性,而不会损失模型精度和芬降效率。通过彻底的实验研究,我们展示了Advcl优于跨多个数据集(CiFar-10,CiFar-100和STL-10)和FineTuning方案的最先进的自我监督的自我监督学习方法(线性评估和满模型fineetuning)。
translated by 谷歌翻译
对比学习是机器学习中最快的研究领域之一,因为它可以在没有标记数据的情况下学习有用的表示。然而,对比学学习易于特征抑制,即,它可能会丢弃与感兴趣的任务相关的重要信息,并学习无关的功能。过去的工作通过消除无关信息的手工制作的数据增强解决了这一限制。然而,这种方法不适用于所有数据集和任务。此外,当一个属性可以抑制与其他属性相关的特征时,数据增强在解决多属性分类中的功能抑制中失败。在本文中,我们分析了对比学习的目标函数,并正式证明它易于特征抑制。然后,我们提出预测对比学习(PCL),一种学习对特征抑制具有鲁棒的无监督表示的框架。关键的想法是强制学习的表示来预测输入,因此防止它丢弃重要信息。广泛的实验验证PCL是否强大地对特征抑制和优于各种数据集和任务的最先进的对比学习方法。
translated by 谷歌翻译
Image-based head swapping task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swapping dataset and benchmark so far. In this paper, we propose an image-based head swapping framework (HS-Diffusion) which consists of a semantic-guided latent diffusion model (SG-LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping. SG-LDM can further implement fine-grained head swapping with the blended layout as condition by a progressive fusion process, while preserving source head and source body with high-quality reconstruction. To this end, we design a head-cover augmentation strategy for training and a neck alignment trick for geometric realism. Importantly, we construct a new image-based head swapping benchmark and propose two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion.
translated by 谷歌翻译
Positive-Unlabeled (PU) learning aims to learn a model with rare positive samples and abundant unlabeled samples. Compared with classical binary classification, the task of PU learning is much more challenging due to the existence of many incompletely-annotated data instances. Since only part of the most confident positive samples are available and evidence is not enough to categorize the rest samples, many of these unlabeled data may also be the positive samples. Research on this topic is particularly useful and essential to many real-world tasks which demand very expensive labelling cost. For example, the recognition tasks in disease diagnosis, recommendation system and satellite image recognition may only have few positive samples that can be annotated by the experts. These methods mainly omit the intrinsic hardness of some unlabeled data, which can result in sub-optimal performance as a consequence of fitting the easy noisy data and not sufficiently utilizing the hard data. In this paper, we focus on improving the commonly-used nnPU with a novel training pipeline. We highlight the intrinsic difference of hardness of samples in the dataset and the proper learning strategies for easy and hard data. By considering this fact, we propose first splitting the unlabeled dataset with an early-stop strategy. The samples that have inconsistent predictions between the temporary and base model are considered as hard samples. Then the model utilizes a noise-tolerant Jensen-Shannon divergence loss for easy data; and a dual-source consistency regularization for hard data which includes a cross-consistency between student and base model for low-level features and self-consistency for high-level features and predictions, respectively.
translated by 谷歌翻译
Information Extraction (IE) aims to extract structured information from heterogeneous sources. IE from natural language texts include sub-tasks such as Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE). Most IE systems require comprehensive understandings of sentence structure, implied semantics, and domain knowledge to perform well; thus, IE tasks always need adequate external resources and annotations. However, it takes time and effort to obtain more human annotations. Low-Resource Information Extraction (LRIE) strives to use unsupervised data, reducing the required resources and human annotation. In practice, existing systems either utilize self-training schemes to generate pseudo labels that will cause the gradual drift problem, or leverage consistency regularization methods which inevitably possess confirmation bias. To alleviate confirmation bias due to the lack of feedback loops in existing LRIE learning paradigms, we develop a Gradient Imitation Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data, which can force pseudo-labeled data to achieve better optimization capabilities similar to labeled data. Based on how well the pseudo-labeled data imitates the instructive gradient descent direction obtained from labeled data, we design a reward to quantify the imitation process and bootstrap the optimization capability of pseudo-labeled data through trial and error. In addition to learning paradigms, GIRL is not limited to specific sub-tasks, and we leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings (semi-supervised IE and few-shot IE).
translated by 谷歌翻译
场景图是一种语义表示,表达场景中对象之间的对象,属性和关系。场景图在许多交叉模态任务中起着重要作用,因为它们能够捕获图像和文本之间的交互。在本文中,我们关注场景图修改(SGM),其中需要系统来学习如何基于自然语言查询更新现有场景图。与以前重建整个场景图的方法不同,我们通过引入增量结构扩展(ISE)来将SGM作为图形扩展任务。 ISE通过逐步扩展源图来构建目标图,而无需更改未修改的结构。基于ISE,我们进一步提出了一个模型,该模型在节点预测和边缘预测之间进行迭代,从而逐渐推断出更准确和和谐的扩展决策。此外,我们构建了一个具有挑战性的数据集,该数据集包含比现有数据集更复杂的查询和更大的场景图。四个基准测试的实验证明了我们的方法的有效性,该实验超过了以前的最新模型。
translated by 谷歌翻译
作为第一个会话级的中文数据集,Chase包含两个单独的部分,即从Scratch(Chase-C)手动构建的2,003个会话,以及从英语SPARC(Chase-T)翻译的3,456个会话。我们发现这两个部分是高度差异,并且作为培训和评估数据不兼容。在这项工作中,我们介绍了SESQL,这是中文的另一个大规模会话级文本到SQL数据集,由5,028个会话组成,所有课程都是从Scratch手动构建的。为了保证数据质量,我们采用迭代注释工作流程,以促进对先前的自然语言(NL)问题和SQL查询的紧张和及时审查。此外,通过完成所有与上下文有关的NL问题,我们获得了27,012个独立的问题/SQL对,允许SESQL用作单轮多DB文本到SQL解析的最大数据集。我们通过使用三个竞争性会话级解析器,并提供详细的分析,对SESQL进行基准测试级文本到SQL解析实验。
translated by 谷歌翻译
对新数据库的普遍性对于旨在将人类话语解析为SQL语句的文本到SQL系统至关重要。现有作品通过利用确切的匹配方法来确定问题单词和模式项目之间的词汇匹配来实现这一目标。但是,这些方法在其他具有挑战性的场景中失败,例如,表面形式在相应的问题单词和架构项目之间有所不同的同义词替代。在本文中,我们提出了一个名为ISESL-SQL的框架,以迭代地构建问题令牌和数据库模式之间的语义增强的架构链接图。首先,我们以无监督的方式通过探测过程提取PLM的模式链接图。然后,通过深图学习方法在训练过程中进一步优化了模式链接图。同时,我们还设计了一个称为图形正则化的辅助任务,以改善模式链接图中提到的模式信息。对三个基准测试的广泛实验表明,ISESL-SQL可以始终优于基准,进一步的研究表明其普遍性和鲁棒性。
translated by 谷歌翻译